Bottoms up! How pointy-headed mathematical biologists approach cell biology and what we could gain from computational biologists
Fred Adler, University of Utah

I will give several examples of how mathematical modelers painstakingly construct equations to capture the nonlinear feedbacks that characterize even simple pathways in cell biology, and the tension between this approach and the dreams created by Big Data. Based on these examples, I will conclude with thoughts on how these complementary approaches can be COMBINEd, as it were, to more effectively understand and capture complexity.

Collaborative Development of Neural Models with NeuroML
Sharon Crook, Arizona State University

The Neural Open Markup Language (NeuroML) project is an international, collaborative initiative to facilitate the exchange of complex neural models, allow for greater transparency and accessibility of models, enhance interoperability between simulators and other tools, and support the development of new software and databases. In this presentation, I will provide an overview of the latest NeuroML based tools, examples of how NeuroML is being used by the computational neuroscience community, and the relationship between NeuroML and other neuroinformatic efforts. I also will provide examples of how NeuroML is being used as the foundation for collaborative modeling efforts at Open Source Brain.

Experimental Synthetic Biology: from concept to application
Tara Deans, University of Utah

The rapidly emerging field of synthetic biology originated in simple model organisms such as yeast and bacteria. However, as synthetic biology continues to expand into mammalian systems, it has become increasingly important to take a multidisciplinary approach for predicting, analyzing, and applying this new technology to higher organisms. Applications in synthetic biology will benefit from standards for data exchange, mathematical models, and experimental approaches. In this talk I will discuss methods for applying synthetic biology for understanding human disease and the need for standards in these experiments.

Toward methods, software, and standards for more comprehensive whole-cell models
Jonathan Karr, Fellow, Institute and Department for Genetics and Genomic Sciences, Mount Sinai School of Medicine

A central challenge in biology is to understand how phenotype arises from genotype. Despite decades of research which have produced vast amounts of data, a complete, predictive understanding of biological behavior remains elusive. Computational methods including whole-cell modeling are needed to assemble the rapidly growing amount of biological data into a unified understanding.

Recently, we developed the first whole-cell computational model1. The model represents the dynamics of every molecular species, accounts for every known gene function, and predicts high-level behaviors such as cell cycle dynamics and growth. We believe that whole-cell models have great potential to help scientists discovery new biology, help bioengineers rationally design microorganisms to perform useful functions, and help clinicians tailor therapies to individual patients.

To develop this model, we created several new methods and software tools including a modular modeling framework, a multi-algorithm simulator, model reduction approaches to parameter estimation and testing, a quantitative pathway/genome database, a simulation database, and visualization software. These new tools were initially designed for whole-cell modeling experts. These tools must be improved and standardized to make whole-cell models more transparent and reproducible, to enable more researchers to contribute to whole-cell modeling, and ultimately, to accelerate the applications of whole-cell modeling to science, engineering, and medicine.

We will present an introduction to whole-cell modeling and its applications to science, engineering, and medicine; describe our current efforts to develop more comprehensive models; outline the open challenges in whole-cell modeling; and describe our early efforts to incorporate systems biology standards into whole-cell modeling.

  • Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B, Assad-Garcia N, Glass JI & Covert MW. A whole-cell computational model predicts phenotype from genotype. Cell 150, 389–401 (2012).
  • Karr JR, Takahasi K & Funahashi A. The principles of whole-cell modeling. Curr Opin Microbiol 27, 18–24 (2015).

Why an (interactive) visualization is worth a thousand numbers
Miriah Meyer, University of Utah

The advancement of techniques and methods for collecting data has fundamentally changed how we study life and all of its complexities. But generating data is only the first step --- developing methods to make sense of vast collections of information is now widely considered the major challenge. A key component of addressing this challenge is visualization, which supports sense-making by representing data as pictures and supporting exploration through human-computer interactions. In this talk I'll discuss how we design interactive visualizations and how scientists use these tools to glean insight from complex data.

An update on Systems Biology Graphical Notation
Huaiyu Mi, University of Southern California

Extraordinary advances in sequencing and other high throughput technologies have led to a significant increase in the knowledge about biological processes. Computer and software have been used for storage, representation and analysis of increasingly complex biology knowledge. Clear and unambiguous visual representation is crucial for scientific discovery and communication, especially between human scientists and the computers. Systems Biology Graphical Notation (SBGN) is a community standard for visual representation of biological pathway networks, and will serve for this purpose. Developed by a diverse community of biologists, bioinformaticians, computer scientists, ontologists and software engineers. SBGN reflects its diverse user base and the varied requirements. SBGN is defined in three complementary sublanguages that represent biological processes from different perspectives. An electronic exchange format (SBGN-ML) has also been established and a prototype library (libSBGN) to computationally exchange, validate and manipulate SBGN diagrams has been developed. In addition, SBGN is supported by a number of tools and databases. Recently, SBGN has been focused on supporting and interoperating with other well-established community standards, such as Systems Biology Market Language (SBML), BioPAX and Synthetic Biology Open Language (SBOL).

The Synthetic Biology Open Language (SBOL) - recent developments in SBOL tools and repositories
Anil Wipat, Newcastle University

Standardisation is a fundamental and defining aspect of synthetic biology. The use of standards is required at all stages of the synthetic biology lifecycle in order to promote interoperability, enable abstraction, facilitate reusable and modular design, and underpin automation. In this talk I will describe the Synthetic Biology Open Language (SBOL), a proposed standard, developed by members of the synthetic biology community, for exchanging designs of synthetic biological systems. I will give an overview of the development of SBOL since its inception in 2008, discuss the current version of the standard (SBOL v2.0), outline recent developments and highlight proposed new directions and opportunities. SBOL has stimulated the development of a wide variety of tools and repositories that support the SBOL standard, and I will also briefly review a cross-section of these tools. In particular, I will focus on tools for storing, sharing and developing synthetic biology designs that have recently been developed by my group in Computing Science at Newcastle University.

Automated Verification and Modification of DNA Sequences against DNA Synthesis Constraints
Ernst Oberortner, DOE Joint Genome Institute (JGI)

Before in silico designed DNA sequences can be synthesized, the sequences must be verified and modified regarding DNA synthesis constraints, such as repeats or GC content.

At the DOE Joint Genome Institute (JGI), we developed the Sequence Polishing Library (SPL) to verify and – in case of violations – to modify DNA sequences in an automated manner. Modifications depend on the region of a DNA sequence that violates a constraint. For example, if a coding sequence contains repeats, then codon juggling can be performed. However, codon juggling cannot be performed if a violation occurs in a non-coding region, such as promoters. To perform the optimal modification in the interest of the designer, we emphasize the use of standards that support meta-information about DNA sequences, such as via annotations. SPL supports the exchange of sequence information using the FASTA, GenBank, and SBOL file formats.

In addition, SPL offers a yet simple but expressive language to specify DNA synthesis constraints. Our goal is to further develop and contribute this language to standardize the communication of DNA synthesis constraints.

On the Adoption of Standards at the DOE Joint Genome Institute
Ernst Oberortner, DOE Joint Genome Institute (JGI)

One objective of the DNA Synthesis Science Program of the DOE Joint Genome Institute (JGI) is to enable users to physically build novel biological systems. JGI aims to deliver highly qualitative systems to users, to increase the number of users, and to scale the size of synthetic constructs. Hence, the adoption of standards plays an integral role in the automation of JGI’s internal workflows and in the cross-organizational exchange of data and information about the biological systems.

Here, we demonstrate the state-of-the-art workflows at the DOE JGI and the utilized data exchange formats therein. Working towards automated workflows shed light on the limitations of internally developed solutions and why the adoption of current standards is not a feasible solution yet. We believe, however, that working closely together with standardization communities will contribute (1) to the development of appropriate and easy to adopt standards, (2) to the science and the strategic vision of the DOE JGI, and (3) to lower the entry bar for new users and the community at large.

Developing the Virtual Physiological Human: tools, techniques, and best practices for data exchange, storage, and publication
David Nickerson*, Tommy Yu, Hugh Sorby, Alan Garny, Poul Nielsen, Peter Hunter, Auckland Bioengineering Institute, University of Auckland

We present here tools, techniques and best practices that aid scientists in the development and application of mathematical models and computational simulation experiments in their work toward the creation of a virtual physiological human. In the examples to be presented we make use of the Physiome Model Repository (PMR) and the software tools OpenCOR and MAP Client. PMR provides a framework for the storage, curation, description, and exchange of data. By using standards appropriate for their data, scientists maximize their ability to reuse existing knowledge and enable others to make use of their achievements in novel work. We will also briefly discuss the use of these tools in teaching new PhD students fundamental concepts underlying computational physiology.

Managing model complexity using CellML and Physiome Model Repository
David Nickerson*, Tommy Yu, Poul Nielsen, Peter Hunter, Auckland Bioengineering Institute, University of Auckland

Mathematical models of biological systems can easily grow into large and highly complex sets of mathematical equations. Comprehension of such complex models is significantly improved by abstracting a given model into various sub-models with clearly defined interfaces and restricting communication between sub-models to those interfaces. The CellML format has well established mechanisms that allow models to be expressed in this manner. Dividing the model into many parts, however, raises the technical issue of synchronising the many sub-models as they each evolve, added into or removed from the main model over time. We present here the features by which the Physiome Model Repository (PMR) can be used to not only manage the development of these complex models over time but also enhance the ability to share each model or sub-model with collaborators or the scientific community at large.

A field guide to automated cloning for accelerated bioengineering
Xingjian Xu, Almer van der Sloot, Raik Grünberg, University of Montreal

The assembly of larger DNA constructs, often from a mix of gene

synthesized and in-house fragments, is the starting point for most synthetic biology projects. Despite major technical advances, DNA assembly remains a bottleneck in many laboratories. Fully automated robotic cloning has been achieved within selected companies but has traditionally been considered too expensive and complex for academic settings.

We have implemented a robotic synthetic biology workstation that automates the complete multi-fragment DNA assembly work flow. This includes fragment PCR setup, cleanup, Gibson assembly, transformation, spreading on ANSI/SLAS-format microplates, robust colony picking, colony PCR, DNA miniprep and auxiliary steps. Usability and user-adoption is facilitated by a strictly modular design as well as by convenient configuration through MS-Excel tables.

The much higher experimental throughput now creates new bottlenecks with associated standardization demands. Manual assembly planning and primer design become increasingly impractical. Sample tracking and management as well as recording of quality control data pose additional challenges. We discuss experiences with automated assembly design (j5) and an in-house sample tracking solution (""Rotten Microbes""). We argue that this area of synthetic biology could largely benefit from data exchange standardization.

The Cell Behavior Ontology: describing the biological behaviors of real and simulated cells seen as spatially active agents
James P. Sluka*, Sherry G. Clendenon, Maciej Swat and James A. Glazier

Biocomplexity Institute, Indiana University, Bloomington, Indiana, USA

The lack of a formalized method for describing the spatiality and intrinsic biological behaviors of cells makes it difficult to adequately describe cells, tissues and organs as spatial objects in living tissues, in vitro assays and in computational models of tissues. The Cell Behavior Ontology (CBO) describes multi-cell systems. In particular, it describes (1) the spatiotemporal aspects of multi-cell systems, (2) the existential behaviors (growth, movement, adhesion, death ...) and (3) computational models of those behaviors. The CBO is an OWL-2 ontology that can describe both observable cell systems such as histological sections and in silico multicell systems. CBO provides a basis for describing the spatial and observable behaviors of cells and extracellular components suitable for describing in vivo, in vitro and in silico multicell systems. Using the CBO, a modeler creates a meta-model of a simulation of a biological system and link that meta-model to experiment and/or simulation results. Annotation of a multicell model and its computational representation, using the CBO, makes the statement of the underlying biology explicit. The CBO can also be used to describe the components in microscopic images of tissues. The formal representation of such biological abstraction facilitates the validation, falsification, discovery, sharing and reuse of both models and experimental data. Since the CBO can explicitly treat spatiality at the cell scale it can be used as a mapping schema for big data repositories of microscopic images (and movies) of biological materials. This allows us to write SPARQL queries, based on the CBO, that can be used to large relational databases of annotated biological images.

Divide and Conquer: Using SBML to represent both the whole body and subcellular scales in a liver-centric multiscale model
James P. Sluka*, Xiao Fu, Maciej Swat and James A. Glazier

Biocomplexity Institute, Indiana University, Bloomington, Indiana, USA

Pharmacological and toxicological processes occur across a wide range of spatial and temporal scales and include multiple organ systems. An in silico pharmacological model must include submodels that cover these multiple scales and multiple tissues relevant to human medicine and toxicology. We have developed a liver-centered, mechanism based, multiscale in silico simulation framework for xenobiotic toxicity and metabolism that incorporates four key biological scales: (1) Population variation scale, (2) Physiologically-Based Pharmacokinetic (PBPK) whole body scale, (3) Tissue and multicellular scale, and (4) Sub-cellular signaling and metabolic pathways scale. Our multiscale in silico framework focuses on the liver, a critical organ in many toxicological, pharmacological, normal and disease processes. We make extensive use of models written in SBML and couple multiple copies of two distinct SBML models to our multi-cell based simulation. These SBML models represent both the whole-body scale and the subcellular reaction kinetic (metabolic) scales. The use of SBML format for both the largest and smallest scales allows us to leverage existing models and modelling tools for the two widely different scales. In addition, the SBML models can be run as stand-alone models allowing us to refine the models individually using existing SBML tools for tasks such as parameter fitting and sensitivity analysis.

Using Atomizer to analyze and debug reaction-network models
Jose-Juan Tapia, John A. P. Sekar and James R. Faeder; University of Pittsburgh

We have previously developed a tool called Atomizer that allows us to extract implicit assumptions from reaction-network models, like those encoded by the Systems Biology Modeling Language (SBML). This is done by performing stoichiometric and lexical analysis combined with annotation information over the reactions and species in a model, which leads to the identification of structural sites that mediate biochemical interactions between the reactants in a model. Moreover the atomization of reactions indicates how a reaction involving one site in one molecule is affected by status of its neighboring sites in the same molecule. This contextual information is indicative of the set of assumptions that were made in the formulation of a model. Here, we have developed novel methods for analyzing and visualizing these contextual interactions, which are important not only for understanding the assumptions made in a model but can also be useful for debugging models and revealing other inconsistencies.

Build and View Rule-Based Models With Simmune Modeler and Network Viewer
Fengkai Zhang1, Hsueh-Chien Chen1,2, Bastian Angermann1, Martin Meier-Schellersheim1
1. Laboratory of Systems Biology, National Institute of Allergy and Infectious Diseases, NIH, USA
2. Department of Computer Science, University of Maryland, USA

Rule-based modelling is an approach that defines rules for interactions between pairs of molecule binding sites, specifying how the interactions depend on particular states of the molecules and their location in specific compartments. Defining a rule-based model typically requires writing scripts, which impacts the accessibility of the benefits of rule-based modelling to non-specialists. Simmune is a software package providing a visual interface to create rule-based models of cellular signalling networks in an intuitive way with iconographic symbols of molecules and their properties. The models can be visualized as biochemical reaction networks through a variety of views: as global networks, local sub-networks and with detailed rendering of the underlying reaction rules. Using several real-world biological models we will illustrate the process of model creation with the Simmune Modeler, network visualization with the Simmune Network Viewer, and model export in SBML and SBML Multi formats. This work is supported by the intramural program of the NIAID, NIH.

New Standard Resources for Systems Biology: Bigg 2 Database and Visual Pathway Editing with Escher
Andreas Dräger1,2, Zachary A. King2, Justin S. Lu2, Ali Ebrahim2, Nikolaus Sonnenschein3, Philip C. Miller2, Joshua A. Lerman4, Bernhard O. Palsson5,6,7 and Nathan E. Lewis7

1Center for Bioinformatics Tuebingen (ZBIT), University of Tuebingen, Tübingen, Germany, 2Bioengineering, University of California, San Diego, La Jolla, CA, USA, 3Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Hørsholm, Denmark, 4Total New Energies USA, Inc., Amyris, Inc., Emeryville, CA, USA, 5Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA, 6Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark, 7Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA

Background: Genome-scale metabolic network reconstructions enable the simulation and analysis of complex biological networks, thus providing insights into how thousands of genes together influence cell phenotypes. Accuracy in systems biology research requires standards in model construction, a variety of specific software tools, and access to high-quality metabolic networks.

Results: To meet these needs, we present the BiGG 2 database and a collection of software solutions for model building, curation, visualization, and simulation. BiGG 2 currently contains 77 high-quality manually-curated genome-scale metabolic network reconstructions, which can be easily searched and browsed and include interactive pathway map visualizations.
These visualizations have been generated with the web-based Escher pathway builder. Escher allows users to draw pathways in a semi-automated way and can visualize data related to genes or proteins that are associated to pathways. An export function facilitates storing Escher maps in the community formats SBML and SBGN-ML. These features make Escher an ideal interactive model development tool.
In order to make all models in BiGG 2 MIRIAM compliant, BiGG 2 itself has become part of the MIRIAM registry and provides links a plethora of external databases for each model component. This rich annotation enables rapid comparison across models. New Systems Biology Ontology terms have been defined that are used to better highlight the role of model components. A comprehensive web API for programmatically accessing the database content enables interfacing with diverse modeling and analysis tools.
Conclusions: With these features and tools, BiGG 2 provides a valuable database, structured for easy access and to help improve the quality, standardization, and accessibility of all genome-scale models. The development of this resource has boosted the development of community standards for constraint-based modeling.
Availability: http://bigg.ucsd.edu, https://escher.github.io

Tagir Valeev1,3, Nikita Mandrik2,4,*, Sofiya Kinsht5, Fedor Kolpakov1,2

1Institute of Systems Biology, Ltd., Novosibirsk, Russia
2Design Technological Institute of Digital Techniques, SBRAS
3Institute of Informatics Systems, SBRAS
4Sobolev Institute of Mathematics, SBRAS
5Novosibirsk State University, Novosibirsk, Russia

BioUML IDE combines capabilities of the BioUML platform and NetBeans IDE. First of all there is integrated code editor. For example, for JavaScript editor indents lines, matches words and brackets, highlights source code syntactically and semantically, and it provides intelligent code completion. It also can be extended to support other languages. Another feature is that there is a wide set of community provided plug-ins for NetBeans. Auto update is the third feature. Thus it is not necessary for user to check for new version on the web, IDE will notify if there are updates automatically. There is integration with Git, which can be useful in users' projects. The last feature is convenient customizable UI.

Sed-ML as BioUML workflow
Ivan Evshin1,2, Ilya Kiselev1,2,*, Fedor Kolpakov1,2

1Institute of Systems Biology, Ltd., Novosibirsk, Russia
2Design Technological Institute of Digital Techniques, SBRAS

SED-ML was designed to describe simulation experiments of computational models in a formal way that facilitates results publication and reproducibility. It contains references to models and various steps to change the model, simulate it and prepare final results. On the other hand scientists often use workflow formalism to describe a series of computational or data manipulation steps. BioUML provides a workflow management system, which is intuitively handled through a simple drag-and-drop interface. Here we introduce SED-ML document represented as workflow in the BioUML system.

BioUML workflow describes how a computation proceed in the form of a directed graph, where each analysis node represents a task to be executed and edges represent either data flow or execution dependencies between different tasks. For the representation of SED-ML as workflow we have created several special BioUML analyses. “Download model” analysis fetches the model from given URI and imports it into BioUML repository. “Change diagram” analysis accepts model and outputs model with specific changes applied. Unlike SED-ML, that describes model changes in the level of XML, we describe changes in the level of objects. “Simulation analysis” perform time corse or one step simulation of the given model and outputs simulation results. “Algebraic steady state” computes steady state of the input model. “Generate report” analysis accepts simulation result and produces specified charts and tables. Workflow cycle nodes are used to represent SED-ML repeat elements and “Merge simulation results” analysis is used to accumulate results of simulation from cycle iterations. To support SED-ML ""resetModel=false"" feature analysis “Set initial values from simulation result” is used to change model initial values to the values from given simulation result.

To proof the correctness of workflows generated from SED-ML we ensure that examples from official specification and libsedml software give reliable results.

Improved SBGN (ML) support in BioUML
Ilya Kiselev1,2,*, Sofiya Kinsht1,3, Fedor Kolpakov1,2

1Institute of Systems Biology, Ltd., Novosibirsk, Russia
2Design Technological Institute of Digital Techniques, SBRAS
3Novosibirsk State University, Novosibirsk, Russia

In BioUML mathematical model is represented as visual diagram. Each element of diagram may be associated with element (variable, equation) of mathematical model. BioUML provides several visual notations including SBGN. Diagram should be semantically correct according to both SBML and SBGN rules e.g. each reaction should have at least one reactant or product, each arc should be connected to appropriate glyph. This is controlled automatically as user creates model. This process implies conversion process between SBML and SBGN and vice-versa and was implemented in BioUML earlier.

Recently we have significantly improved support of SBGN and SBGN-ML in BioUML.

  1. Logical operator is supported.
  2. Phenotype is fully supported.
  3. Convenient way to change color, border and font for each particular diagram element.
  4. Creation of styles for similar customization of groups of elements.
  5. Corrected visualisation of process glyph.
  6. BioUML-specific elements “subdiagram” and “port” replaced by SBGN “submap” and “tag”.
  7. Draw on the fly experimental option.
  8. SBGN-ML export and import.
  9. BioUML-specific annotation providing lossless reimport back to BioUML.
  10. Support of render annotation for SBGN-ML.

User may import SBGN-ML document, provide reaction rates, initial values and run simulation or export model as SBML.Thus we have implemented chain SBML ↔ (SBML+ SBGN) Diagram ↔ SBGN-ML.

There are few things that are not translated from SBML to SBGN (ML) or vice-versa.

  1. Phenotype SBGN element has no mathematical meaning and is not translated to SBML.
  2. Equations, events and functions are represented in simple BioUML-specific notation.
  3. Parameters are not represented visually.

Automated design of combinational logic circuits in bacteria
Bryan Der, MIT

Living cells can sense and respond to changes in a variety of environmental signals. So far, engineering new information processing circuits to control these conditional responses has been a challenging and time-consuming process. We have developed a library of insulated genetic logic gates and a software design environment called Cello, which allow electronic design specifications to be automatically converted

to a complete DNA sequence that executes the program in bacterial cells. Cello was used for automated design of 60 circuits, where 44 functioned correctly in the first experimental implementation. This result represents a significant advancement in the scale and success rate of genetic circuit design. To enable broad access, we implemented a web application (www.cellocad.org) where users can design logic functions of interest using an intuitive interface. Users also have the option to upload data using a constraints file describing custom sensors, logic gates, and actuators to build circuits in other experimental conditions and cell types of interest. We envision Cello providing a flexible and robust design environment for engineering circuits with diverse gates in diverse cell types.

The whole-cell network of Mycoplasma genitalium
Paulo E. P. Burke, Claudia B. L. Campos, Marcos G. Quiles, Federal University of Sao Paulo

Several interactomes, such as Protein Interaction Networks, Metabolic Networks, and Gene-Expression Regulatory Networks, have been developed to model distinct cellular processes but, in fact, these processes are all interconnected. To elucidate these interactions, we designed a set of rules, which allowed the development of an integrative interactome aimed to connect every cellular process into a single complex network, i.e., a whole-cell network. As example, we built the first whole-cell network for the bacteria Mycoplasma genitalium on the basis of literature and database information. The resulting model contained 2,740 intracellular entities, including genes, RNAs, proteins, and metabolites, as well as 4,348 reactions. We provided the M. genitalium's whole-cell network in SBML and GML file formats. We believe that this dataset may be useful in many research fields, including synthetic biology, systems biology, and network science.

Reaction kinetics database SABIO-RK
Andreas Weidemann*, Ulrike Wittig, Maja Rey, Renate Kania, Martin Golebiewski, Wolfgang Müller, HITS gGmbH, Heidelberg, Germany

SABIO-RK (http://sabio.h-its.org/) is a manually curated database containing data about biochemical reactions and their kinetic properties. These data are mainly based on information reported in the scientific literature that are manually extracted and stored in a structured format. The the data are expanded by including annotations to controlled vocabularies, ontologies and external databases. SABIO-RK supports modellers, as well as experimentalists, in the very time consuming process of searching and collecting information from publications. SABIO-RK also offers direct data upload from lab experiments and supports SBML for import and export of data and models.

A GPU-based 3D Integrated Complex-fluid Toolbox for Modeling Cellular Dynamics
Jia Zhao

Department of Mathematics,
University of North Carolina at Chapel Hill
Qi Wang
Department of Mathematics,
University of South Carolina, Columbia

Cells are fundamental units in all living organisms since animals and plants are all made up of millions of cells of different varieties. The study of cells is therefore an essential part of research in life science.

In this talk, we take into account the hydrodynamic interactions for cellular dynamics, where we can model cells as a complex-fluid mixture with multiple components, i.e., we treat nucleus, cytosol, cortex, membrane and extra-cellular matrix as viscoelastic fluids with different rheology properties. A GPU-based 3D integrated complex-fluid toolbox has been developed for high-performance computing. The data are stored in HDF5 format, and then Visit is used for post-processing and visualization. The toolbox has been adapted to study biofilm (which is an organism of bacteria) formations and animal cell mitosis (where a mother cell divides into two offspring cells). Verified by experiment data, this integrated code package is therefore an effective in silico tool for analyzing cellular dynamics.

Data-driven parameter inference for gene circuit modeling
Linh Huynh and Ilias Tagkopoulos, University of California, Davis

Mathematical modeling and numerical simulation are crucial to support design decisions in synthetic biology. A major challenge is the accurate inference of parameter values as measuring them directly from experiments is difficult. For that reason, their value is usually estimated by fitting a model with available experimental data from one or more biological systems. However, this estimation tends to be inaccurate and to overfit, due to the sloppiness of biological models. To address this challenge, we propose a new approach that utilizes multiple datasets to infer the parameter value with uncertainty quantification. Our preliminary results demonstrate the efficiency of our this approach for both synthetic data and real gene expression data that were compiled from literature.

The NormSys registry for modeling standards in systems and synthetic biology
Martin Golebiewski, Alexander Nikolaew, Nils Woetzel, Jill Zander, Heidelberg Institute for Theoretical Studies (HITS), Heidelberg (Germany)

Different stakeholders need to be engaged in the standardization process to incorporate their specific requirements: Researchers form academia and industries with their grass-roots standardization communities like COMBINE, as well as representatives of standardization bodies (e.g. the International Organization for Standardization ISO), scientific journals and research funding agencies. The project NormSys aims at enhancing and promoting the formal standardization of existing modeling community standards by building a bridge between stakeholder groups and developing the means for transferring information about community standards between them. To survey standard formats for computational modeling in biology such as SBML, CellML, SBGN, SED-ML, SBOL, NeuroML, PharmML and others we develop a registry which not only lists the standards, but also compares their major features, their possible fields of biological application and use cases (including model examples), as well as their relationships, commonalities and differences. This NormSys registry for modeling standards provides a common entry point for modelers and software developers who plan to apply the standards for their respective case of application, and serves them with detailed information and links to the standards, their specifications and APIs.

A combined systems and structural modeling approach repositions antibiotics for Mycoplasma genitalium
Denis Kazakiewicza,b, Jonathan R. Karrc, Karol M. Langner d,e, Dariusz Plewczynski b,f,g

aCenter for Statistics, Universiteit Hasselt, Hasselt BE3500, Belgium, bCenter for Innovative Research, Medical University of Bialystok, Bialystok 15-089, Poland, cDepartment of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York NY 10029, USA, dDepartment of Molecular Physiology & Biological Physics, University of Virginia, Charlottesville VA 22908, USA, eCurrent address: Google Inc., Mountain View CA 94043, USA, fCentre of New Technologies, University of Warsaw, Warsaw 02-097, Poland, gThe Jackson Laboratory for Genomic Medicine, Farmington CT 06030, USA

Bacteria are increasingly resistant to existing antibiotics. New methods are needed to identify targets, including repositioning targets among distantly related species. In this talk I present a method of combination of systems and structural modeling and bioinformatics to reposition known antibiotics and targets to new species. We applied this approach to Mycoplasma genitalium. First, we used quantitative metabolic modeling to identify enzymes whose expression affects the cellular growth rate. Second, we searched the literature for inhibitors of homologs of the most fragile enzymes. Lastly, we used molecular docking to verify that the reported inhibitors preferentially interact with M. genitalium proteins over their human homologs. Thymidylate kinase was the top predicted target and piperidinylthymines were the top compounds. In summary, combined systems and structural modeling is a powerful tool for drug repositioning.

SBOL visual: introduction, recent developments, and current challenges
Jacqueline Quinn, Autodesk Research, Autodesk Inc. (currently Google Inc.), San Francisco, CA, United States

Robert Sidney Cox III, Chemical Science and Engineering, Kobe University, Kobe, Japan Aaron Adler, Information and Knowledge Technologies, Raytheon BBN Technologies, Cambridge, MA, United States Jacob Beal*, Information and Knowledge Technologies, Raytheon BBN Technologies, Cambridge, MA, United States Swapnil Bhatia, Electrical and Computer Engineering, Boston University, Boston, MA, United States Yizhi Cai, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom Joanna Chen, Fuels Synthesis and Technologies Divisions, Joint BioEnergy Institute, Emeryville, CA, United States; Lawrence Berkeley National Lab, Berkeley, CA, United States Kevin Clancy, Synthetic Biology Unit, ThermoFisher Scientific, Carlsbad, CA, United States Michal Galdzicki, Arzeda Corp, Seattle, WA, United States NathanJ. Hillson, Fuels Synthesis and Technologies Divisions, Joint BioEnergy Institute, Emeryville, CA, United States; Lawrence Berkeley National Lab, Berkeley, CA, United States Nicolas Le Novère, Babraham Institute, Cambridge, United-Kingdom Akshay J Maheshwari, Stanford University School of Medicine, Stanford, CA, United States James Alastair McLaughlin, Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom 1 Chris J. Myers, Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT, United States Umesh P, Department of Computational Biology & Bioinformatics, University of Kerala, Kerala, India Matthew Pocock, Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom; Turing Ate My Hamster LTD, Newcastle upon Tyne, United Kingdom Cesar Rodriguez, Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, FL, United States Larisa Soldatova, Computer Science, Brunel University, London, United Kingdom Guy-Bart V Stan, Department of Bioengineering, Centre for Synthetic Biology and Innovation, Imperial College London, South Kensington Campus, London, United Kingdom Neil Swainston, Centre for Synthetic Biology of Fine and Specialty Chemicals (SYNBIOCHEM), University of Manchester, Manchester, United Kingdom Anil Wipat, School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom Herbert M Sauro, Bioengineering, University of Washington, Seattle, WA, United States

Synthetic Biology Open Language (SBOL) Visual is a graphical standard for genetic engineering. It consists of symbols representing DNA subsequences, including regulatory elements and DNA assembly features. These symbols can be used to draw illustrations for communication and instruction, and as image assets for computer-aided design. SBOL Visual provides prototypical symbol images, which have been used in scientific publications and software tools, and a community process for addition and refinement of symbols. This talk will provide and introductory overview of SBOL visual, along with discussion of recent developments, its relationship to SBGN, and current challenges that the community is working on.

Development of Standards for Calibrated Flow Cytometry
Jacob Beal, Raytheon BBN Technologies

Flow cytometry is a remarkable instrument, and the most accessible means we have at present for high-throughput measurement of the distribution of behaviors of individual cells in a population. Unfortunately, typical practice yields measurements in arbitrary units. Absent comparable units, quantitative information cannot be shared and reproduced, and it is essential to establish reproducibility of the distribution of cell behaviors within a population across instruments, laboratories, data sets, or across different channels within a single data set. Recent developments, however, enable calibration of all measurement channels of a flow cytometer to identical, reproducible units. As such, a working group in the NIST Synthetic Biology Standards Consortium is now developing

two standards documents for calibrated flow cytometry: one that specifies minimal information for reproducibility of flow cytometry measurements, and another providing a recommended set of protocols and practices for reliably obtaining such information.

Steps Towards a Curated SBOL Repository: Recreating and Annotating Models from Literature
Zach Zundel, University of Utah

The Systems Biology Open Language (SBOL) is a standard for expressing genetic constructs and their interactions. Models from literature are to be recreated in SBOL so that a standard workflow for the creation and curation of SBOL models can be established. Several tools will be used in creation of unique components and modeling of specific interactions so that the efficacy of each tool and its ability to fulfill its designed role in the creation of SBOL models can be examined and refined.

Theoretical Study for Hydrogen Bonding: A Model for Biological Systems
Mohamed Ayoub, University of Wisconsin-Washington County

The hydrogen bonding has an eminent importance for the structure, function, and dynamics of a vast number of chemical systems, which include all chemistry, structural biology, molecular medicine, and material science. In this work we explore the physical bases for simple hydrogen bonded dimers using DFT B3LYP with aug-cc-pVTZ basis set and by natural bond orbital (NBO) analysis. We computed several theoretical and experimental descriptors for each dimers such as H-bond energies, charge-transfer, H-bond distances, the elongation of HA bond and the red-shift of HA stretching frequency. In addition, we generate overlap plots for hydrogen bonded dimers in 2-d and 3-d, which support strength and binding energy hence the stability of its formation.

BDML: an open format for representing quantitative biological dynamics data
Koji Kyoda*, Yukako Tohsato, Kenneth H.L. Ho, Shuichi Onami, RIKEN Quantitative Biology Center

BDML (Biological Dynamics Markup Language) is an open unified format for representing quantitative data on spatiotemporal dynamics of biological objects from molecules to cells to organisms. Such data are often generated by bioimage informatics or mechanobiological modeling techniques. BDML is based on Extensible Markup Language (XML). Its machine-readability and extensibility enable efficient development of software tools for data visualization and analysis. We have already provided over 300 BDML datasets including those of embryogenesis in C. elegans, D. melanogaster, zebrafish and mouse in SSBD (Systems Science of Biological Dynamics) database. We also developed BDML-compatible software tools for data visualization and analysis. We believe that BDML facilitates a novel approach in analyzing quantitative biological dynamics data, which gain mechanistic insight into the biological dynamics.

'Semantic Annotation with SBML and CellML Models
John H. Gennari, Maxwell Neal, Daniel Cook, Brian Carlson, University of Washington & University of Michigan

Although standards for biosimulation models are well-established, with readily available libraries containing thousands of models, the ability to intelligently search across these resources is not available. A goal of our research group is to support intelligent search capabilities across (for example) both the CellML repository and the BioModels database. By “intelligent search”, we mean search capabilities beyond string searches, such as searches that look for individual entities of interest, e.g. “all models that include or compute the chemical concentration of enzyme X within the mitochondrion.”
To do this, we need semantic annotations at the individual variable level; current annotation efforts, while laudable, are insufficient for our needs. We have developed methods for composite annotation and software (SemGen) to support search and semantic annotation in a manner that leverages and builds from existing standards such as CellML, BioPAX, and SBML. In this talk, we review our methods and software, and then describe how we can apply semantic composite annotations to BioModels (SBML) and the Physiome Model Repository (CellML).

STON translator: SBGN to Neo4j graph database
Vasundra Touré*1,2,3, Alexander Mazein1, Irina Roznovat1, Dagmar Waltemath3, Ron Henkel3, Mansoor Saqi1, Johann Pellet1 and Charles Auffray1

1 European Institute for Systems Biology and Medicine (EISBM), Centre National de la Recherche Scientifique (CNRS), Campus Charles Mérieux - Université de Lyon - 50 Avenue Tony Garnier, 69007 Lyon, France; *IMI-eTRIKS consortium
2 Université Paris-Sud, UFR Sciences Bât. 301, 91405 Orsay cedex, France
3 Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany

Graph databases can be successfully applied in Systems Biology for managing extensive and complex heterogeneous information. Ultimately, graphs are a natural way of representing biological networks. A graph database like Neo4j can thus often provide a better response time and it enables efficient storage, processing and querying of biological networks.

Here we present STON (SBGN TO Neo4j), a Java-based framework to import and translate metabolic, signalling and gene regulatory pathways presented in SBGN Process Description and SBGN Activity Flow languages to a graph-oriented format compatible with Neo4j. Exploiting the power of a graph database opens new opportunities for combining different layers of granularity and for identifying functional sub-modules in the network. Further extensions are planned and will allow merging pathways into larger networks while taking into account possible overlapping areas of the network. The framework is freely available on SourceForge: http://sourceforge.net/projects/ston/.

Efficient Analysis of SBML Models Using Arrays
Leandro Watanabe, University of Utah

The leading standard for modeling biological systems is SBML. Although SBML has been successful in representing simple biochemical models, the standard lacks the structure for representing large complex systems like the whole-cell model. The model requires a large number of variables for representing certain aspects of the cell, such as the genome, and SBML is not designed to do so. To leverage this deficiency, the arrays package has been proposed to represent regular structures more easily. However, in order to take full advantage of the package, analysis need to be aware of the arrays structure. When flattening a model, some of the advantages of using arrays is lost. This paper describes a more efficient way to simulate arrayed models.

Semantics-based composition of CellML and SBML models
Maxwell L. Neal*, Daniel L. Cook, John H. Gennari: University of Washington

Christopher T. Thompson, Brian E. Carlson: University of Michigan

As biosimulation models grow in size and complexity, there is an increasing need for tools that will help organize model content and allow efficient model composition from reusable modeling components encoded in standard formats such as CellML and SBML. To address this need we have developed the SemSim framework, a logical model description architecture for organizing simulation models according to their semantics, i.e. their biological content. These semantics are primarily captured within SemSim models using composite annotations that combine biomedical ontology terms to create a precise, machine-readable definition of a model element. Once applied, these annotations allow investigations and compositions of models at the biological, rather than computational code, level of abstraction. We have demonstrated the power of this approach by integrating and re-encoding real-world CellML cardiomyocyte models with our SemGen software. In this talk I will illustrate how the SemSim framework helps automate model composition and will discuss our current efforts to build integrated models from reusable components encoded in CellML and SBML.

Common Pattern in Computational Models Reaction Networks
Ron Henkel, Fabienne Lambusch, Dagmar Waltemath, Wolfgang Müller

In Systems Biology, the number of available computational models is growing fast. Models are published in open repositories and standard formats. Analyzing models regarding their content, e.g. semantic annotations or structure of the encoded reaction network, is a prerequisite to understand and, sequentially, reuse models.

We use a graph database for an enriched and integrated storage of model information, including reaction network, semantic annotation and ontologies, simulation descriptions, and relations to other models.

With this data at hand, we analyzed the biochemical reaction networks and derived 37 commonly used pattern that are shared by at least 350 models. We also revealed the usage of the identified pattern to be unequally distributed among the models. We hypothesize that this distribution is a key factor for model similarity.

Modeling crosstalk: Human mTOR signaling pathway as a center node
Namrata Tomar1, Rajat K. De2, and Julio-Vera Gonalez1

1Laboratory of Systems Tumor and Immune Biology, Department of Dermatology, University Hospital, Friedrich-Alexander-University Erlangen-Nurnberg, Hartmannstrasse 14, 91052 Erlangen, Germany 2Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700108, India.

We have modeled the crosstalk phenomenon in human mTOR signaling pathway as a center node, where all of the interacting pathways have been considered as hypothetical interacting entities, termed as a ‘crosstalk nodes’.

Motivation: The motivation behind the study is - in the presence of many inputs to a signaling pathway, there is a high chance of getting it excess activation. Therefore, to put a ‘brake’ over excessive activation, the pathway applies extra efforts in the form of regulatory loops (Efeyan and Sabatini, 2010).

Objective: There are two purposes behind the current study, viz., to investigate the effect of crosstalk on the pathway under study, along with feedback inhibition. We have modeled the crosstalk among mTOR (considered as a central pathway), Insulin, Wnt and MAPK pathways. Approach: We have applied Flux Balance Analysis (FBA), along with the incorporation of feedback inhibition in the methodology and compared the behavior of studied pathway, with/without crosstalk. The major difference with the typical FBA is incorporation of concentration factor, feedback inhibition and crosstalk simultaneously into modeling aspect, which is the significance of this study.

Results: We have obtained higher concentration for the regulators of the reactions, which induce feedback inhibition in the pathway, with crosstalk nodes, in comparison with the pathway having no crosstalk nodes. We have validated the results with existing experimental evidences. Conclusion: This methodology is a novel way of pathway analysis, where one can integrate two or more pathway processes simultaneously to observe the impact of a pathway process on the other one.

A Converter from the Systems Biology Markup Language to the Synthetic Biology Open Language
Tramy Nguyen, University of Utah and Chris Myers, University of Utah

Standards are important because they enable exchange and reproducibility of genetic designs. There are two standards that are discussed in this talk; namely Synthetic Biology Open Language (SBOL) and Systems Biology Markup Language (SBML). SBOL describes structural and basic qualitative behavioral aspects of a biological design. SBML is a standard for behavioral models of biological systems at the molecular level. Converting SBML to SBOL enables a consistent connection between behavioral and structural information about a biological design. This paper will discuss about converting an SBML model with annotations using the Systems Biology Ontology (SBO) and infer the structure and qualitative function to produce an SBOL data file.

MultiCellDS: Standardization of cell phenotypic data for data exchange
Samuel Friedman, University of Southern California

Systems of multiple cells (and multiple cell types) are of fundamental importance in biology. An increasing number of simulation engines are routinely modeling systems of millions of cells in diverse fields: bacteria, yeast, developmental biology, parts of organs, cancer, and others. Quantitatively understanding the emergent complexities of these systems requires a method to record data, both experimental and simulation. MultiCellDS (MultiCellular Data Standard) systematizes such data with two main data types: Digital Cell Lines, computerized analogues of experimental cell lines, and Digital Snapshots, a data format both for annotating experimental data and recording simulation data at a single time point. Recording metadata, microenvironmental conditions, and phenotypic data in a consistent manner enables us to integrate information previously found in disparate, hard-to-find sources, and allows for consistent initialization of computer simulations. We present our recent work on these parts, including the development of 50+ digital cell lines, a data repository, and our new standard for recording and representing digital snapshots. MultiCellDS has evolved from a single lab’s grassroots project to a true community effort of 30+ experimentalists, clinicians, and computational modelers. This community has contributed significant feedback and improvement to develop consensus on the standard and ensure its adoption. The community is now contributing new digital cell lines for bacteria, yeast, endothelial cells, and multiple cancers, and the authors of multiple simulation engines have begun implementing the digital snapshot standard. We envision MultiCellDS will facilitate creation of community-developed software tools to help read, visualize, explore, analyze, and integrate multicellular data. In turn, these tools will promote better data quality and, greater integration with ontologies, and the development of an API for I/O. And ultimately, these new specifications will help advance us towards high-throughput personalized multicellular computer simulations of cancer and other diseases, faster comparison of model results with experimental and clinical data, and ensembling of simulation predictions.

Bridging the Computational Modelling and EHR standards using openEHR and Semantic Web Technology
Koray Atalag, Aleksandar Zivaljevic, David Nickerson*, Auckland Bioengineering Institute, University of Auckland

Linking clinical data to computational physiology will enable real-world model validation as well as the possibility of personalised and population level predictive decision support tools. Electronic health records (EHR) embody quantifiable manifestations of genomic and environmental aspects that impact on biological systems when clinical data are structured. However data quality and semantic interoperability remains a major challenge in the world of EHRs. In the computational physiology domain recent attempts to enable semantic interoperability heavily rely on Semantic Web technologies and utilise ontology-based annotations (e.g. RICORDO) but a wealth of useful information and knowledge sits in EHRs where Semantic Web technologies have very limited use. openEHR provides a set of an open engineering specifications that provides a canonical health record architecture and open source tooling to support data collection and integration. Core openEHR specifications have also been adopted by ISO and CEN making it a full international standard which underpins many national programs and has multi-vendor implementations worldwide. Our work describes how to use openEHR to normalise, annotate and link clinical data with biophysical models by using openEHR Archetypes as semantic pointers to underlying clinical concepts in EHR.

COMBINE archive meta data
Martin Scharm, Martin Peters, Ron Henkel, Dagmar Waltemath
Dept. of Systems Biology and Bioinformatics, University of Rostock, Germany

The current COMBINE archive meta data is restricted to a subset of RDF vocabulary, based on e.g. Dublin Core and VCard. This prohibits tools to record valuable provenance information.

In this talk we will introduce the current format, discuss the limitations, and present recommendations for an enhanced meta data scheme. Our recommendations include ideas and workarounds for extensions of the current meta data format. We will demonstrate the advantages of such a format using a large collection of real-life examples, based on the 'all singing, all dancing' showcase. The given examples focus on technical solutions (linking data files, annotation encoding etc) rather than on specific RDF issues (negative statements, open/closed world).

While we will be talking about the COMBINE archive, the proposed scheme can be used to provide meta data about any COMBINE related standard.

libSBOLj 2.0: A Java Library to Support SBOL 2.0
Zhen Zhanga, Tramy Nguyena, Nicholas Roehnerb, Goksel Misirlic, Matthew Pocockd, Ernst Oberortnere, Jacob Bealf, Kevin Clancyg, Anil Wipatc, Chris Myersa

aElectrical and Computer Engineering Department, University of Utah, Salt Lake City, UT 84112, USA, bBoston University, Boston, MA 02215, USA, cInterdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science, Newcastle University, Newcastle upon Tyne, UK, dTuring Ate My Hamster, Ltd., Newcastle upon Tyne, UK, eDOE Joint Genome Institute, Walnut Creek, CA 94598, USA, fRaytheon BBN Technologies, Cambridge, MA 02138, USA, gThermoFisher Scientific Synthetic Biology Unit, Carlsbad, CA 92008, USA

The synthetic biology open language (SBOL) is an emerging data standard for representing synthetic biology designs. The goal of SBOL is to improve the reproducibility of these designs and their electronic exchange between researchers and genetic design automation (GDA) tools. The latest proposed version, SBOL 2.0, enables the annotation of a variety of biological components (e.g., DNA, mRNA, proteins, small molecules) and their interactions. In addition, SBOL 2.0 enables researchers to organize these components into hierarchical modules and publish the specification of their intended function. Using SBOL 2.0, these modules can further be linked to models in order to describe their behavior mathematically. To promote the use of SBOL 2.0, we are developing a Java library --- libSBOLj 2.0 --- that provides an easy to use application programming interface (API) for developers, including serialization to and from an RDF/XML file format. In addition, the library supports the conversion from SBOL 1.1 to SBOL 2.0. This paper describes the libSBOLj 2.0 library and the decisions involved in its design.

libRoadRunner: High-Performance Timecourse Simulation and Model Fitting
J Kyle Medley, Wilbert Copeland, Kiri Choi, Stanley Gu, Madhav Murthy, Kaylene Stocking and Herbert Sauro, University of Washington, Seattle

libRoadRunner is an SBML-compliant simulator which translates SBML models into machine code by utilizing just-in-time (JIT) compilation via the LLVM library. This approach offers highly performant model simulation across a wide range of devices, and enables the high throughput necessary to perform parameter sweeps, fitting, and optimization tasks. We demonstrate the utility of this approach via a parameter fitting task using evolutionary optimization methods to optimize parameter values on a population of models. We also show that libRoadRunner can be used as a distributed computing engine, and we discuss how the library's modular design enables new functionality to be added easily. Finally we describe libRoadRunner's automatically generated language APIs, which enable it to be used in a variety of programming environments.

Making modelling standards more attractive to the community
J.L. Snoep, J.J. Eicher, D.D. van Niekerk, D. Waltemath, N. Stanford, S. Owen, W. Mueller, C. Goble.

Community uptake of computational modelling standards is a difficult and slow process. Approaches to increase the uptake speed fall into two categories, which can be classified as either "stick" or "carrot" approaches. Stick approaches might appear more straightforward, e.g. simply force scientists to use standards before acceptance of a scientific publication. However, stick approaches are not very attractive as a model, and typically make scientists adapt to the standard description format after the study has been concluded and thereby not making full use of the available simulation tools during the study. In addition, it is not always trivial to check submitted documents for correctness; for instance, authors will submit an SBML file for a model description but who will check that the model reproduces the published simulation results? We have more than 10 years of experience working with a number of scientific journals to ensure that published model simulation results are reproduced by the submitted models. The model curation is a time-consuming process, but the authors' willingness to solve problems before publication is a great advantage. We have now far-developed plans for inclusion of URL links for model simulation figures that are linked to SED-ML scripts, and these can include experimental data sets. We will illustrate this for several publications of kinetic and FBA models. Lastly we will illustrate a more attractive carrot approach using the FAIRDOM-SEEK as a data and model management platform. Here automated versioning, easy tools for annotation, linking data and models and a push to publish functionality make adhering to standards easy and more functional.

Identifiers.org Update
Nick Juty, EMBL-EBI, Hinxton, Cambridge, UK. CB10 1SD

A service to generate robust and perennial identifiers for data records in the Life Sciences is crucial, and must address two major issues: a) unambiguous association of an identifier to the appropriate dataset (e.g. '9606' identifies ""Homo sapiens"" in the NCBI Taxonomy, but also ""Catha edulis"" in the plant-based GRIN Taxonomy), b) the ability to deal with mirrors, replicated data, and individual URLs (including legacy artifacts). In order to address these and other issues, an effort was launched in 2006 to provide a system through which appropriate URIs (Uniform Resource Identifiers) could be generated. The Identifiers.org resolving system is designed to support the use of HTTP URIs directly for both annotation and cross-referencing purposes. These URIs are directly incorporable in datasets, increase usability by software tools in their processing and display, and are resolvable by the end user, since they can actually be used as they stand in web interfaces. Moreover, these URIs are free, and provide unique, perennial and location-independent identifiers. This infrastructure is already used very successfully by, for example, the computational modelling community, which requires the ability to perennially record cross-references and links to external data records. We describe the Identifiers.org referencing system, as well as recent improvements to the infrastructure which facilitate data cross-referencing and integration.

Identifiers.org: cross-referencing and integration of heterogeneous datasets
Nick Juty1, Sarala Wimalaratne1, Nicolas Le Novère2 and Henning Hermjakob1

1EMBL-EBI, Hinxton, Cambridge, UK. CB10 1SD.
2Babraham Institute, Babraham Research Campus, Cambridge, UK. CB22 3AT.

The Identifiers.org Registry is a catalogue of data collections (corresponding to controlled vocabularies or databases). It stores a variety of useful information such as a description of the dataset, the physical (access) URLs through which collection data can be retrieved, and the identifier patterns that are used by the collection. The information stored allows the construction of robust cross-references and annotations, composed of resolvable URIs, which are perennial and location-independent. We describe the strategy used to create this cross-referencing and annotation system, and how is can be used, for example, to access different data formats, or preferred resolving locations. We also describe its use in data integration, in particular with respect to Linked Data.

JUMMP: a generic and modular model management platform for life sciences
Mihai Glont1, Nick Juty1, Nicolas Le Novère2 and Henning Hermjakob1

1EMBL-EBI, Hinxton, Cambridge, UK. CB10 1SD.
2Babraham Institute, Babraham Research Campus, Cambridge, UK. CB22 3AT.

Computational models are a widely used tool for the study of complex systems in life sciences. In order to facilitate model reuse, the community needs trustworthy and reliable models encoded in standard formats and, to this end, resources such as BioModels Database, JWL Online and the Physiome Model Repository were created. There now exists the need for model management infrastructures which provide support for new ways to build models, involving several people disparately located. Some have already started to tackle this issue, such as the Open Source Brain, which rely on existing control version systems to support worldwide collaboration. An additional consideration is that the size, complexity and number of models generated is ever increasing. This puts a lot of burden on existing infrastructures and makes it difficult for search engines to supply meaningful results. Moreover, some models are now being produced semi-automatically, for example from pathways and reactions resources, Path2Models [Büchel2013]. These challenges are not solely faced by the Systems Biology community; in the field of Pharmacometrics where the predominance of proprietary tools fosters the fragmentation of the landscape and thwarts model reusability, the Drug Disease Model Resources(DDMoRe) project [Harnisch2013] has been established. With these factors in mind, we decided to collaboratively develop the next generation model management infrastructure: Jummp (JUst a Model Management Platform). It has been designed as a complete web based solution for model development, curation and provision. It features a modular architecture consisting of a small core, responsible for the interaction with the back-end storage infrastructure, on top of which various components can be plugged in at runtime. General tasks pertaining to model management, including the storage, versioning, sharing, searching, retrieval or archival of models are abstracted from the model format, allowing the application to remain as generic as possible and to avoid unnecessary coupling. The content of the platform can be explored through a user interface, as well as programmatically, via a RESTful API supporting XML and JSON. As models are updated, users are still able to access previous versions, providing details of model evolution. Multiple model formats are supported, including SBML, PharmML and the COMBINE archive, with additional standards able to be incorporated as required. Secure access to models is one of the key traits of Jummp, which provides a fine-grained control mechanism to the user. A glimpse into the platform’s capabilities is provided by the DDMoRe Model Repository (http://repository.ddmore.eu/), the first public repository running Jummp. In the near future, the platform will be also used to power BioModels Database (http://wwwdev.ebi.ac.uk/biomodels/jummp-biomodels/). Those two resources showcase the versatility of Jummp’s infrastructure.

10 simples rules for making data web friendly
Nick Juty1, Julie McMurry2,3, BioMedBridges Consortium4, Melissa Haendel2,3, Carole Goble5,6, Helen Parkinson1

1EMBL-EBI, Hinxton, Cambridge, UK. CB10 1SD.
2Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, USA.
3Monarch Initiative; http://monarchinitiative.org/
4BioMedBridges Consortium; http://www.biomedbridges.eu/partners
5School of Computer Science, The University of Manchester, Manchester, UK.
6Open PHACTS; http://www.openphactsfoundation.org/

As the quantity and complexity of Life Science data continues to grow, so too does its availability through the web. This can cause potential integration issues where individual record or dataset identification was not conceived for use in a global data landscape. This can manifest in issues such as identifier or namespace collisions, resulting in erroneous integration of data, or simply as “link rot” and “content drift”. As a consequence, there is often a need to provide additional and potentially costly procedures for the source-dependent processing of data, or else the provision of alternative mapping solutions. BioMedBridges, in association with an international community of partners, have begun to determine the common technical bridges required to allow data integration across the biological domain. Based on our experience we describe ten simple rules for best practice in the provision and reuse of identifiers for web-based Life Science data. We further elicit feedback from the community, through the use of an interactive exercise, to determine both their perceived ranking of these rules, and their actual experiences when encountering web-based data.